refined learning bound
Refined Learning Bounds for Kernel and Approximate k -Means
Kernel $k$-means is one of the most popular approaches to clustering and its theoretical properties have been investigated for decades. However, the existing state-of-the-art risk bounds are of order $\mathcal{O}(k/\sqrt{n})$, which do not match with the stated lower bound $\Omega(\sqrt{k/n})$ in terms of $k$, where $k$ is the number of clusters and $n$ is the size of the training set. In this paper, we study the statistical properties of kernel $k$-means and Nystr\{o}m-based kernel $k$-means, and obtain optimal clustering risk bounds, which improve the existing risk bounds. Particularly, based on a refined upper bound of Rademacher complexity [21], we first derive an optimal risk bound of rate $\mathcal{O}(\sqrt{k/n})$ for empirical risk minimizer (ERM), and further extend it to general cases beyond ERM. Then, we analyze the statistical effect of computational approximations of Nystr\{o}m kernel $k$-means, and prove that it achieves the same statistical accuracy as the original kernel $k$-means considering only $\Omega(\sqrt{nk})$ Nystr\{o}m landmark points. We further relax the restriction of landmark points from $\Omega(\sqrt{nk})$ to $\Omega(\sqrt{n})$ under a mild condition.
Supplementary Material for " Refined Learning Bounds for Kernel and Approximate k-Means "
Lemma 2. There exists a set C H However our bound in Lemma 2 does match. To prove Theorem 1, we first give the following two lemmas: Lemma 5. To prove Theorem 4, we first propose the following lemma: Lemma 7. Thus, we can obtain that min(k, Ξ) k Ξ k 1 + c α 1 . Substituting the above inequality into Theorem 4, we can prove this result. Thus, by the above upper bounds the lower bound (Eq.
Refined Learning Bounds for Kernel and Approximate k -Means
Kernel k -means is one of the most popular approaches to clustering and its theoretical properties have been investigated for decades. However, the existing state-of-the-art risk bounds are of order \mathcal{O}(k/\sqrt{n}), which do not match with the stated lower bound \Omega(\sqrt{k/n}) in terms of k, where k is the number of clusters and n is the size of the training set. In this paper, we study the statistical properties of kernel k -means and Nystr\"{o}m-based kernel k -means, and obtain optimal clustering risk bounds, which improve the existing risk bounds. Particularly, based on a refined upper bound of Rademacher complexity [21], we first derive an optimal risk bound of rate \mathcal{O}(\sqrt{k/n}) for empirical risk minimizer (ERM), and further extend it to general cases beyond ERM. Then, we analyze the statistical effect of computational approximations of Nystr\"{o}m kernel k -means, and prove that it achieves the same statistical accuracy as the original kernel k -means considering only \Omega(\sqrt{nk}) Nystr\"{o}m landmark points. We further relax the restriction of landmark points from \Omega(\sqrt{nk}) to \Omega(\sqrt{n}) under a mild condition.